13 research outputs found

    Parallel Voronoi Computation for Physics-Based Simulations

    Get PDF
    International audienceVoronoi diagrams are fundamental data structures in computational geometry, with applications in such areas as physics-based simulations. For non-Euclidean distances, the Voronoi diagram must be performed over a grid-graph, where the edges encode the required distance information. Th e major bottleneck in this case is a shortest path algorithm that must be computed multiple times during the simulation. We present a GPU algorithm for solving the shortest path problem from multiple sources using a generalized distance function. Our algorithm was designed to leverage the grid-based nature of the underlying graph that represents the deformable objects. Experimental results report speed-ups up to 65× over a current reference sequential method

    Parallel Shortest Path Algorithm for Voronoi Diagrams with Generalized Distance Functions

    Get PDF
    International audienceVoronoi diagrams are fundamental data structures in computational geometry with applications on different areas. Recent soft object simulation algorithms for real time physics engines require the computation of Voronoi diagrams over 3D images with non-Euclidean distances. In this case, the computation must be performed over a graph, where the edges encode the required distance information. But excessive computation time of Voronoi diagrams prevent more sophisticated deformations that require interactive topological changes, such as cutting or stitching used in virtual surgery simulations. The major bottleneck in the Voronoi computation in this case is a shortest-path algorithm that must be computed multiple times during the deformation. In this paper, we tackle this problem by proposing a GPU algorithm of the shortest-path algorithm from multiple sources using generalized distance functions. Our algorithm was designed to leverage the grid-based nature of the underlying graph used in the simulation. Experimental results report speed-ups up to 65x over a current reference sequential method.Les Diagrammes de Voronoï sont des structures de données fondamentales de la géométrie algorithmique, avec des applications dans différents domaines. Des nouveaux algorithmes de simulation d'objets déformables, en temps réels, nécessitent le calcul des diagrammes de Voronoï sur des images 3D avec des distances non euclidiennes. Dans ce cas, le calcul doit être effectué sur un graphe, où les arêtes codent l'information de distance requise. Cependant, le temps de calcul des diagrammes de Voronoï est trop coûteux et empêche des déformations plus complexes qui nécessitent des modifications topologiques interactives, telles que la coupe ou la couture utilisée dans les simulations de chirurgie virtuelle. Le goulot d'étranglement majeur dans le calcul de Voronoï dans ce cas est un algorithme du plus court chemin qui doit être calculé plusieurs fois au cours de la déformation. Dans cet article, nous nous attaquons à ce problème en proposant un algorithme de GPU pour le probléme du plus court chemin à partir de plusieurs sources utilisant une fonctions de distance généralisées. Notre algorithme a été conçu pour tirer parti de la nature basé sur une grille du graphe sous-jacent utilisé dans la simulation. Les résultats expérimentaux indiquent des accélérations jusqu'à 65x sur une méthode séquentielle de référence

    Mecanismos para a produção de conteúdo interoperável entre web, TV digital e dispositivos de telefonia móvel

    Get PDF
    Este trabalho apresenta uma proposta de padro- nização para o desenvolvimento unificado de aplicações in- teroperáveis entre web, TV digital e celular. O principal objetivo é evitar a redundância na produção de conteúdo, focalizando a adaptação entre as diferentes plataformas, em lugar da criação e manutenção exclusiva para cada um. São apresentadas recomendações com base em padrões internacionais, buscando-se minimizar as rupturas nos pro- cessos atuais de criação de objetos de aprendizagem. Para validar a proposta, duas aplicações interoperáveis foram desenvolvidas: uma, envolvendo apenas textos e menus; a outra, muito mais complexa, envolvendo um curso de en- sino completo. Ambas as aplicações validaram a proposta apresentada com sucesso

    Parallel Shortest Path Algorithm for Voronoi Diagrams with Generalized Distance Functions

    Get PDF
    International audienceVoronoi diagrams are fundamental data structures in computational geometry with applications on different areas. Recent soft object simulation algorithms for real time physics engines require the computation of Voronoi diagrams over 3D images with non-Euclidean distances. In this case, the computation must be performed over a graph, where the edges encode the required distance information. But excessive computation time of Voronoi diagrams prevent more sophisticated deformations that require interactive topological changes, such as cutting or stitching used in virtual surgery simulations. The major bottleneck in the Voronoi computation in this case is a shortest-path algorithm that must be computed multiple times during the deformation. In this paper, we tackle this problem by proposing a GPU algorithm of the shortest-path algorithm from multiple sources using generalized distance functions. Our algorithm was designed to leverage the grid-based nature of the underlying graph used in the simulation. Experimental results report speed-ups up to 65x over a current reference sequential method.Les Diagrammes de Voronoï sont des structures de données fondamentales de la géométrie algorithmique, avec des applications dans différents domaines. Des nouveaux algorithmes de simulation d'objets déformables, en temps réels, nécessitent le calcul des diagrammes de Voronoï sur des images 3D avec des distances non euclidiennes. Dans ce cas, le calcul doit être effectué sur un graphe, où les arêtes codent l'information de distance requise. Cependant, le temps de calcul des diagrammes de Voronoï est trop coûteux et empêche des déformations plus complexes qui nécessitent des modifications topologiques interactives, telles que la coupe ou la couture utilisée dans les simulations de chirurgie virtuelle. Le goulot d'étranglement majeur dans le calcul de Voronoï dans ce cas est un algorithme du plus court chemin qui doit être calculé plusieurs fois au cours de la déformation. Dans cet article, nous nous attaquons à ce problème en proposant un algorithme de GPU pour le probléme du plus court chemin à partir de plusieurs sources utilisant une fonctions de distance généralisées. Notre algorithme a été conçu pour tirer parti de la nature basé sur une grille du graphe sous-jacent utilisé dans la simulation. Les résultats expérimentaux indiquent des accélérations jusqu'à 65x sur une méthode séquentielle de référence

    Algorithmes et structures de données parallèles pour applications interactives

    No full text
    The quest for performance has been a constant through the history of computing systems. It has been more than a decade now since the sequential processing model had shown its first signs of exhaustion to keep performance improvements.Walls to the sequential computation pushed a paradigm shift and established the parallel processing as the standard in modern computing systems. With the widespread adoption of parallel computers, many algorithms and applications have been ported to fit these new architectures. However, in unconventional applications, with interactivity and real-time requirements, achieving efficient parallelizations is still a major challenge.Real-time performance requirement shows-up, for instance, in user-interactive simulations where the system must be able to react to the user's input within a computation time-step of the simulation loop. The same kind of constraint appears in streaming data monitoring applications. For instance, when an external source of data, such as traffic sensors or social media posts, provides a continuous flow of information to be consumed by an on-line analysis system. The consumer system has to keep a controlled memory budget and delivery fast processed information about the stream.Common optimizations relying on pre-computed models or static index of data are not possible in these highly dynamic scenarios. The dynamic nature of the data brings up several performance issues originated from the problem decomposition for parallel processing and from the data locality maintenance for efficient cache utilization.In this thesis we address data-dependent problems on two different application: one in physics-based simulation and other on streaming data analysis. To the simulation problem, we present a parallel GPU algorithm for computing multiple shortest paths and Voronoi diagrams on a grid-like graph. To the streaming data analysis problem we present a parallelizable data structure, based on packed memory arrays, for indexing dynamic geo-located data while keeping good memory locality.La quête de performance a été une constante à travers l'histoire des systèmes informatiques.Il y a plus d'une décennie maintenant, le modèle de traitement séquentiel montrait ses premiers signes d'épuisement pour satisfaire les exigences de performance.Les barrières du calcul séquentiel ont poussé à un changement de paradigme et ont établi le traitement parallèle comme standard dans les systèmes informatiques modernes.Avec l'adoption généralisée d'ordinateurs parallèles, de nombreux algorithmes et applications ont été développés pour s'adapter à ces nouvelles architectures.Cependant, dans des applications non conventionnelles, avec des exigences d'interactivité et de temps réel, la parallélisation efficace est encore un défi majeur.L'exigence de performance en temps réel apparaît, par exemple, dans les simulations interactives où le système doit prendre en compte l'entrée de l'utilisateur dans une itération de calcul de la boucle de simulation.Le même type de contrainte apparaît dans les applications d'analyse de données en continu.Par exemple, lorsque des donnes issues de capteurs de trafic ou de messages de réseaux sociaux sont produites en flux continu, le système d'analyse doit être capable de traiter ces données à la volée rapidement sur ce flux tout en conservant un budget de mémoire contrôlé.La caractéristique dynamique des données soulève plusieurs problèmes de performance tel que la décomposition du problème pour le traitement en parallèle et la maintenance de la localité mémoire pour une utilisation efficace du cache.Les optimisations classiques qui reposent sur des modèles pré-calculés ou sur l'indexation statique des données ne conduisent pas aux performances souhaitées.Dans cette thèse, nous abordons les problèmes dépendants de données sur deux applications différentes: la première dans le domaine de la simulation physique interactive et la seconde sur l'analyse des données en continu.Pour le problème de simulation, nous présentons un algorithme GPU parallèle pour calculer les multiples plus courts chemins et des diagrammes de Voronoi sur un graphe en forme de grille.Pour le problème d'analyse de données en continu, nous présentons une structure de données parallélisable, basée sur des Packed Memory Arrays, pour indexer des données dynamiques géo-référencées tout en conservant une bonne localité de mémoire

    Roubo de trabalho em processadores gráficos

    No full text
    Unidades de processamento gráfico (GPU) tornaram-se ferramentas de grande valia no domínio da computação de alto desempenho. Graças as recentes inovações e melhoramentos do hardware é possível utilizar processadores gráficos de propósito genéricos (GPGPUS) em uma ampla gama de aplicações científicas. No entanto, os modelos de programação existentes usados em GPGPU não são ainda suficientemente adaptáveis `as diversas formas de paralelismo que uma aplicação possa expressar. Neste contexto, propomos um modelo híbrido de programação paralela para GPGPU usando paralelismo de tarefas e de dados. Em oposição ao que e advogado pelo modelo de programação CUDA, baseado apenas no paralelismo de dados, mostramos que ´e possível explorar o paralelismo de tarefas em GPUs e escaloná-las de forma eficiente usando a técnica do roubo de tarefas. Apresentamos neste trabalho a implementação de um escalonador por roubo de tarefas em CUDA e comparamos seu desempenho aos métodos de escalonamento estático e por lisa aplicados aos problemas de transformação em array e particionamento em octree.Graphics Processing units have become a valuable support for High Performance Computing (HPC) applications. However, despite the many improvements on the General Purpose GPU, there is still the need of a generic programming model adaptable to the many forms of parallelism that an application can express. The CUDA programming model is widely used on the GPGPU domain, but is very limited in aspects like load balancing and task parallelism. This work introduces a new programming model to be used on general purpose graphics processors. We propose an hybrid model combining tasks and data parallelism which extends the domain of applications that can efficiently make use of graphics processors. We implement a work stealing scheduler to efficiently schedule tasks inside a GPU keeping an even load balance between its multiprocessors. Finally, we evaluate the performance of our work stealing scheduler comparing it with static and list scheduling methods applied to the problems of array transformation and octree partitioning

    Parallel algorithms and data structures for interactive data problems

    No full text
    La quête de performance a été une constante à travers l'histoire des systèmes informatiques.Il y a plus d'une décennie maintenant, le modèle de traitement séquentiel montrait ses premiers signes d'épuisement pour satisfaire les exigences de performance.Les barrières du calcul séquentiel ont poussé à un changement de paradigme et ont établi le traitement parallèle comme standard dans les systèmes informatiques modernes.Avec l'adoption généralisée d'ordinateurs parallèles, de nombreux algorithmes et applications ont été développés pour s'adapter à ces nouvelles architectures.Cependant, dans des applications non conventionnelles, avec des exigences d'interactivité et de temps réel, la parallélisation efficace est encore un défi majeur.L'exigence de performance en temps réel apparaît, par exemple, dans les simulations interactives où le système doit prendre en compte l'entrée de l'utilisateur dans une itération de calcul de la boucle de simulation.Le même type de contrainte apparaît dans les applications d'analyse de données en continu.Par exemple, lorsque des donnes issues de capteurs de trafic ou de messages de réseaux sociaux sont produites en flux continu, le système d'analyse doit être capable de traiter ces données à la volée rapidement sur ce flux tout en conservant un budget de mémoire contrôlé.La caractéristique dynamique des données soulève plusieurs problèmes de performance tel que la décomposition du problème pour le traitement en parallèle et la maintenance de la localité mémoire pour une utilisation efficace du cache.Les optimisations classiques qui reposent sur des modèles pré-calculés ou sur l'indexation statique des données ne conduisent pas aux performances souhaitées.Dans cette thèse, nous abordons les problèmes dépendants de données sur deux applications différentes: la première dans le domaine de la simulation physique interactive et la seconde sur l'analyse des données en continu.Pour le problème de simulation, nous présentons un algorithme GPU parallèle pour calculer les multiples plus courts chemins et des diagrammes de Voronoi sur un graphe en forme de grille.Pour le problème d'analyse de données en continu, nous présentons une structure de données parallélisable, basée sur des Packed Memory Arrays, pour indexer des données dynamiques géo-référencées tout en conservant une bonne localité de mémoire.The quest for performance has been a constant through the history of computing systems. It has been more than a decade now since the sequential processing model had shown its first signs of exhaustion to keep performance improvements.Walls to the sequential computation pushed a paradigm shift and established the parallel processing as the standard in modern computing systems. With the widespread adoption of parallel computers, many algorithms and applications have been ported to fit these new architectures. However, in unconventional applications, with interactivity and real-time requirements, achieving efficient parallelizations is still a major challenge.Real-time performance requirement shows-up, for instance, in user-interactive simulations where the system must be able to react to the user's input within a computation time-step of the simulation loop. The same kind of constraint appears in streaming data monitoring applications. For instance, when an external source of data, such as traffic sensors or social media posts, provides a continuous flow of information to be consumed by an on-line analysis system. The consumer system has to keep a controlled memory budget and delivery fast processed information about the stream.Common optimizations relying on pre-computed models or static index of data are not possible in these highly dynamic scenarios. The dynamic nature of the data brings up several performance issues originated from the problem decomposition for parallel processing and from the data locality maintenance for efficient cache utilization.In this thesis we address data-dependent problems on two different application: one in physics-based simulation and other on streaming data analysis. To the simulation problem, we present a parallel GPU algorithm for computing multiple shortest paths and Voronoi diagrams on a grid-like graph. To the streaming data analysis problem we present a parallelizable data structure, based on packed memory arrays, for indexing dynamic geo-located data while keeping good memory locality

    Roubo de trabalho em processadores gráficos

    No full text
    Unidades de processamento gráfico (GPU) tornaram-se ferramentas de grande valia no domínio da computação de alto desempenho. Graças as recentes inovações e melhoramentos do hardware é possível utilizar processadores gráficos de propósito genéricos (GPGPUS) em uma ampla gama de aplicações científicas. No entanto, os modelos de programação existentes usados em GPGPU não são ainda suficientemente adaptáveis `as diversas formas de paralelismo que uma aplicação possa expressar. Neste contexto, propomos um modelo híbrido de programação paralela para GPGPU usando paralelismo de tarefas e de dados. Em oposição ao que e advogado pelo modelo de programação CUDA, baseado apenas no paralelismo de dados, mostramos que ´e possível explorar o paralelismo de tarefas em GPUs e escaloná-las de forma eficiente usando a técnica do roubo de tarefas. Apresentamos neste trabalho a implementação de um escalonador por roubo de tarefas em CUDA e comparamos seu desempenho aos métodos de escalonamento estático e por lisa aplicados aos problemas de transformação em array e particionamento em octree.Graphics Processing units have become a valuable support for High Performance Computing (HPC) applications. However, despite the many improvements on the General Purpose GPU, there is still the need of a generic programming model adaptable to the many forms of parallelism that an application can express. The CUDA programming model is widely used on the GPGPU domain, but is very limited in aspects like load balancing and task parallelism. This work introduces a new programming model to be used on general purpose graphics processors. We propose an hybrid model combining tasks and data parallelism which extends the domain of applications that can efficiently make use of graphics processors. We implement a work stealing scheduler to efficiently schedule tasks inside a GPU keeping an even load balance between its multiprocessors. Finally, we evaluate the performance of our work stealing scheduler comparing it with static and list scheduling methods applied to the problems of array transformation and octree partitioning

    A New Programming Paradigm for GPGPU

    No full text
    International audienceGraphics Processing units (GPU) have become a valuable support for High Performance Computing (HPC) applications. However, despite the many improvements of General Purpose GPUs, the current programming paradigms available, such as NVIDIA's CUDA, are still low-level and require strong programming effort, especially for irregular applications where dynamic load balancing is a key point to reach high performances. This paper introduces a new hybrid programming scheme for general purpose graphics processors using two levels of parallelism. In the upper level, a program creates, in a lazy fashion, tasks to be scheduled on the different Streaming Multiprocessors (MP), as defined in the NVIDIA's architecture. We have embedded inside GPU a well-known work stealing algorithm to dynamically balance the workload. At lower level, tasks exploit each Streaming Processor (SP) following a data-parallel approach. Preliminary comparisons on data-parallel iteration over vectors show that this approach is competitive on regular workload over the standard CUDA library Thrust, based on a static scheduling. Nevertheless, our approach outperforms Thrust-based scheduling on irregular workloads

    Packed-Memory Quadtree: a cache-oblivious data structure for visual exploration of streaming spatiotemporal big data

    Get PDF
    International audienceThe visual analysis of large multidimensional spatiotem-poral datasets poses challenging questions regarding storage requirements and query performance. Several data structures have recently been proposed to address these problems that rely on indexes that pre-compute different aggregations from a known-a-priori dataset. Consider now the problem of handling streaming datasets, in which data arrive as one or more continuous data streams. Such datasets introduce challenges to the data structure, which now has to support dynamic updates (insertion-s/deletions) and rebalancing operations to perform self-reorganizations. In this work, we present the Packed-Memory Quadtree (PMQ), a novel data structure designed to support visual exploration of streaming spatiotemporal datasets. PMQ is cache-oblivious to perform well under different cache configurations. We store streaming data in an internal index that keeps a spatiotemporal ordering over the data following a quadtree representation, with support for real-time insertions and deletions. We validate our data structure under different dynamic scenarios and compare to competing strategies. We demonstrate how PMQ could be used to answer different types of visual spatiotemporal range queries of streaming datasets
    corecore